Model Selection

Audio-visual QA

# Audio-visual QA

Ola-7B is a multimodal language model jointly developed by Tencent, Tsinghua University, and Nanyang Technological University, based on the Qwen2.5 architecture. It supports processing image, video, audio, and text inputs and outputs text.

Multimodal Fusion

Safetensors Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase